NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Ready or Not, Here I Come: Characterizing the Security of Prematurely-public Web Applications

Kondracki, Brian; Ferdman, Michael; Nikiforakis, Nick (December 2024, Annual Computer Security Applications Conference (ACSAC))

Full Text Available
Server Architecture From Enterprise to Post-Moore

https://doi.org/10.1109/MM.2024.3418975

Falsafi, Babak; Ferdman, Michael; Grot, Boris (September 2024, IEEE Micro)

Luiz Barroso started his career at Digital Equipment Corporation, investigating workload-optimized multiprocessor server architectures marketed to enterprises in the 1990s. These high-margin, low-volume products lost their market to more cost-effective enterprise servers built from high-volume desktop CPUs riding Moore’s law. The enterprise market has slowly transitioned to the cloud, where desktop PCs have formed the backbone of computing in data centers since the early 2000s to minimize cost and maximize the return on investment. Moving forward, with the absence of Moore’s law, future servers require a clean-slate, cross-stack design to scale in compute, communication, and storage capacity while reducing operational, capital, and environmental costs.
more » « less
Full Text Available
A Case for Hardware Memoization in Server CPUs

https://doi.org/10.1109/LCA.2024.3505075

Samandi, Farid; Ratnasegar, Natheesan; Ferdman, Michael (July 2024, IEEE Computer Architecture Letters)
Yang, Chia-Lin (Ed.)
Server applications exhibit a high degree of code repetition because they handle many similar requests. In turn, repeated execution of the same code, often with identical inputs, highlights an inefficiency in the execution of server software and suggests memoization as a way to improve performance. Memoization has been extensively explored in software, and several hardware- and hardware-assisted memoization schemes have been proposed in the literature. However, these works targeted memoization of mathematical or algorithmic processing, whereas server applications call for a different approach. We observe that the opportunity for memoization in servers arises not from eliminating the repetition of complex computation, but from eliminating the repetition of software orchestration code. This work studies hardware memoization in servers, ultimately focusing on one pattern, instruction sequences starting with indirect jumps.We explore how an out-of-order pipeline can be extended to support memoization of these instruction sequences, demonstrating the potential of hardware memoization for servers. Using 26 applications to make our case (3 CloudSuite workloads and 23 vSwarm serverless functions), we show how targeting just this one pattern of instruction sequences can memoize over 10% (up to 15.6%) of the dynamically executed instructions in these server applications.
more » « less
Full Text Available
NUCAlloc: Fine-Grained Block Placement in Hashed Last-Level NUCA Caches

https://doi.org/10.1145/3650200.3656604

Soori, Raveendra; Prabhu, Shreyas; Chawla, Harpreet Singh; Ferdman, Michael (May 2024, ACM)

Modern last-level caches are partitioned into slices that are spread across the chip, giving rise to varying access latencies dictated by the physical location of the accessing core and the cache slice being accessed. Although, prior work has shown that dynamically determining the best location for blocks within such Non-Uniform Cache Access architectures can provide significant performance benefits, current hardware does not implement this functionality. Instead, modern processors hash blocks across the LLC slices, obscuring the non-uniform architecture of the underlying cache and forfeiting the performance benefits of placing data in the nearest cache slices. Moreover, while prior work advocated improving performance by delegating control over block placement to the operating system at page granularity, modern processor hardware thwarts these approaches by hashing cache slice selection at cache block granularity. In this work, we make two observations that enable us to improve software performance on modern NUCA architectures. First, we find that software can undo the hashing performed by hardware and efficiently manage data placement at cache block granularity. Second, that the complexity of fine-grained data placement can be hidden from the developer by embedding it in the dynamic memory allocator. Leveraging these observations, we design a new specialized memory allocator, NUCAlloc, suitable for use with C++ containers such as std::map and std::set. NUCAlloc handles the complexity of NUCA-aware block placement, improving the performance of containers by placing their data into the nearest LLC slices. We demonstrate that our NUCAlloc prototype consistently outperforms std::allocator and jemalloc for LLC-resident containers, improving performance by up to 20% in both single-threaded and multi-threaded software.
more » « less
Full Text Available
TAILCHECK: A Lightweight Heap Overflow Detection Mechanism with Page Protection and Tagged Pointers

Gopal, Amogha_Udupa_Shankaranarayana; Soori, Raveendra; Ferdman, Michael; Lee, Dongyoon (July 2023, 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23))
The More Things Change, the More They Stay the Same: Integrity of Modern JavaScript

https://doi.org/10.1145/3543507.3583395

So, Johnny; Ferdman, Michael; Nikiforakis, Nick (January 2023, Proceedings of the ACM Web Conference (WWW))

Full Text Available
Waverunner: An Elegant Approach to Hardware Acceleration of State Machine Replication

Alimadadi, Mohammadreza; Mai, Hieu; Cho, Shenghsun; Ferdman, Michael; Milder, Peter; Mu, Shuai (April 2023, The 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI '23))

State machine replication (SMR) is a core mechanism for building highly available and consistent systems. In this paper, we propose Waverunner, a new approach to accelerate SMR using FPGA-based SmartNICs. Our approach does not implement the entire SMR system in hardware; instead, it is a hybrid software/hardware system. We make the observation that, despite the complexity of SMR, the most common routine—the data replication—is actually simple. The complex parts (leader election, failure recovery, etc.) are rarely used in modern datacenters where failures are only occasional. These complex routines are not performance critical; their software implementations are fast enough and do not need acceleration. Therefore, our system uses FPGA assistance to accelerate data replication, and leaves the rest to the traditional software implementation of SMR. Our Waverunner approach is beneficial in both the common and the rare case situations. In the common case, the system runs at the speed of the network, with a 99th percentile latency of 1.8 µs achieved without batching on minimum-size packets at network line rate (85.5 Gbps in our evaluation). In rare cases, to handle uncommon situations such as leader failure and failure recovery, the system uses traditional software to guarantee correctness, which is much easier to develop and maintain than hardware-based implementations. Overall, our experience confirms Waverunner as an effective and practical solution for hardware accelerated SMR—achieving most of the benefits of hardware acceleration with minimum added complexity and implementation effort.
more » « less
Full Text Available
Domains Do Change Their Spots: Quantifying Potential Abuse of Residual Trust

https://doi.org/10.1109/SP46214.2022.9833609

So, Johnny; Miramirkhani, Najmeh; Ferdman, Michael; Nikiforakis, Nick (January 2022, Proceedings of the IEEE Symposium on Security and Privacy (IEEE S&P))

Full Text Available
An incrementally updatable and scalable system for large-scale sequence search using the Bentley–Saxe transformation

https://doi.org/10.1093/bioinformatics/btac142

Almodaresi, Fatemeh; Khan, Jamshed; Madaminov, Sergey; Ferdman, Michael; Johnson, Rob; Pandey, Prashant; Patro, Rob; Boeva, ed., Valentina (March 2022, Bioinformatics)

Abstract MotivationIn the past few years, researchers have proposed numerous indexing schemes for searching large datasets of raw sequencing experiments. Most of these proposed indexes are approximate (i.e. with one-sided errors) in order to save space. Recently, researchers have published exact indexes—Mantis, VariMerge and Bifrost—that can serve as colored de Bruijn graph representations in addition to serving as k-mer indexes. This new type of index is promising because it has the potential to support more complex analyses than simple searches. However, in order to be useful as indexes for large and growing repositories of raw sequencing data, they must scale to thousands of experiments and support efficient insertion of new data. ResultsIn this paper, we show how to build a scalable and updatable exact raw sequence-search index. Specifically, we extend Mantis using the Bentley–Saxe transformation to support efficient updates, called Dynamic Mantis. We demonstrate Dynamic Mantis’s scalability by constructing an index of ≈40K samples from SRA by adding samples one at a time to an initial index of 10K samples. Compared to VariMerge and Bifrost, Dynamic Mantis is more efficient in terms of index-construction time and memory, query time and memory and index size. In our benchmarks, VariMerge and Bifrost scaled to only 5K and 80 samples, respectively, while Dynamic Mantis scaled to more than 39K samples. Queries were over 24× faster in Mantis than in Bifrost (VariMerge does not immediately support general search queries we require). Dynamic Mantis indexes were about 2.5× smaller than Bifrost’s indexes and about half as big as VariMerge’s indexes. Availability and implementationDynamic Mantis implementation is available at https://github.com/splatlab/mantis/tree/mergeMSTs. Supplementary informationSupplementary data are available at Bioinformatics online.
more » « less
On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers

https://doi.org/10.18653/v1/2021.findings-acl.363

Ji, Tianchu; Jain, Shraddhan; Ferdman, Michael; Milder, Peter; Schwartz, H. Andrew; Balasubramanian, Niranjan (August 2021, Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records